Semi-supervised graph partitioning with decision trees.

نویسندگان

  • Timothy Hancock
  • Hiroshi Mamitsuka
چکیده

In this paper we investigate a new framework for graph partitioning using decision trees to search for sub-graphs within a graph adjacency matrix. Graph partitioning by a decision tree seeks to optimize a specified graph partitioning index such as ratio cut by recursively applying decision rules found within nodes of the graph. Key advantages of tree models for graph partitioning are they provide a predictive framework for evaluating the quality of the solution, determining the number of sub-graphs and assessing overall variable importance. We evaluate the performance of tree based graph partitioning on a benchmark dataset for multiclass classification of tumor diagnosis based on gene expression. Three graph cut indices will be compared, ratio cut, normalized cut and network modularity and assessed in terms of their classification accuracy, power to estimate the optimal number of sub-graphs and ability to extract known important variables within the dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Combinatorial View of Graph Laplacians

Discussions about different graph Laplacians—mainly the normalized and unnormalized versions of graph Laplacian—have been ardent with respect to various methods of clustering and graph based semi-supervised learning. Previous research in the graph Laplacians, from a continuous perspective, investigated the convergence properties of the Laplacian operators on Riemannian Manifolds. In this paper,...

متن کامل

Building Classifiers With Unrepresentative Training Instances: Experiences From The KDD Cup 2001 Competition

In this paper we discuss our experiences in participating in the KDD Cup 2001 competition. The task involved classifying organic molecules as either active or inactive in their binding to a receptor. The classification task presented three challenges: highly skewed class distribution, large number of features exceeding training set size by two orders of magnitude, and non-representative trainin...

متن کامل

A Combinatorial View of the Graph Laplacians

Discussions about different graph Laplacians, mainly normalized and unnormalized versions of the graph Laplacians, have been ardent with respect to various methods in clustering and graph based semi-supervised learning. Previous research on the graph Laplacians investigated their convergence properties to Laplacian operators on continuous manifolds. There is still no strong proof on convergence...

متن کامل

Transactions on Machine Learning and Data Mining Editorial

In this volume we have to discuss two papers: Both papers are very interesting, innovative and clearly written. These papers have something in common. This is the use of specific elements and the search for them. Such elements are the starting point for learning structures and replace simple but complex searches. Both papers also make use of graphs for the representation. Related problems have ...

متن کامل

Efficient Learning of Random Forest Classifier using Disjoint Partitioning Approach

Random Forest is an Ensemble Supervised Machine Learning technique. Research work in the area of Random Forest aims at either improving accuracy or improving performance. In this paper we are presenting our research towards improvement in learning time of Random Forest by proposing a new approach called Disjoint Partitioning. In this approach, we are using disjoint partitions of training datase...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2008